Taint Assisted Concolic Execution (TACE) utilizes the concept of taint in symbolic execution to identify all sets of dependent symbols. TACE can evaluate a subset of these sets with a significantly reduced testing effort by concretizing some symbols from selected subsets. The remaining subsets are explored with symbolic values. TACE optimizes symbolic constraints, facilitating a streamlined symbolic execution process that emphasizes the critical aspects of program behavior. Doing so contributes to a more practical and scalable approach for identifying vulnerabilities, generating comprehensive test suites, and aiding software security analysis. TACE stands out as an ideal tool for uncovering real vulnerabilities in applications, owing to its substantial enhancements in the realms of constraint solving and fuzzing. This makes it a highly effective asset in the practice of software analysis.
The TACE paper has been accepted for publication in the 32nd ACM International Conference on the Foundations of Software Engineering (FSE 2024).
Cite paper (ACM digital library): https://dl.acm.org/doi/10.1145/3617555.3617874
@inproceedings{10.1145/3663529.3663812,
author = {Jain, Ridhi and Tihanyi, Norbert and Ndhlovu, Mthandazo and Ferrag, Mohamed Amine and Cordeiro, Lucas C.},
title = {Rapid Taint Assisted Concolic Execution (TACE)},
year = {2024},
isbn = {9798400706585},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3663529.3663812},
doi = {10.1145/3663529.3663812},
abstract = {While fuzz testing is a popular choice for testing open-source software, it might not effectively detect bugs in programs that feature many symbols due to the significant increase in exploration of the program executions. Fuzzers can be more effective when they concentrate on a smaller and more relevant set of symbols, focusing specifically on the key executions. We present rapid Taint Assisted Concolic Execution (TACE), which utilizes the concept of taint in symbolic execution to identify all sets of dependent symbols. TACE can evaluate a subset of these sets with a significantly reduced testing effort by concretizing some symbols from selected subsets. The remaining subsets are explored with symbolic values. TACE significantly enhances speed, achieving a 50x constraint-solving time improvement over SymQEMU in binary applications. In our fuzzing campaign, we tested five popular open-source libraries (minizip-ng, TPCDump, GifLib, OpenJpeg, bzip2) and identified a new heap buffer overflow in the latest version of GifLib 5.2.1 with an assigned CVE-2023-48161 number. Under identical conditions and hardware environments, SymCC could not identify the same issue, underscoring TACE's enhanced capability in quickly discovering real-world vulnerabilities.},
booktitle = {Companion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering},
pages = {627–631},
numpages = {5},
keywords = {Fuzzing, Symbolic Execution, Taint Assisted Concolic Execution},
location = {Porto de Galinhas, Brazil},
series = {FSE 2024}
}
The above figure illustrates the hybrid fuzzing architecture of TACE. The input source project is compiled with TACE and AFL-Clang to generate a TACE-instrumented binary and an AFL-instrumented binary, respectively. As TACE uses the underlying architecture of SymCC, it uses SymCChelper to generate inputs from the TACE-instrumented binary. AFL generates the initial seed for the AFL-instrumented binary. The AFL-instrumented binary is fuzzed to explore unique crashes and generate new inputs. The seeds generated by SymCChelper are only retained if they explore previously unseen paths. The collected seeds are used to fuzz the input program and generate new seeds.
A Docker container has been created for TACE, which is built upon SymCC, QEMU, and LLVM. This approach ensures a streamlined and user-friendly setup process, as the containerization effectively manages compatibility and dependency issues. During the build phase, the necessary dependencies are installed, and TACE patches are applied to the relevant components. Tested on Ubuntu LTS 22.04.1
-
Install Docker Engine. For Ubuntu-specific guidance, refer to https://docs.docker.com/engine/install/ubuntu/ for detailed information.
-
Clone the git repository of TACE:
git clone https://github.com/tacetool/TACE
-
Build the docker container:
cd TACE/tace && sudo docker build -t tace -f TACE_docker .
-
Run and test the docker
sudo docker run -it --rm tace /bin/bash
The above steps have successfully built the TACE Docker container, which is now ready for use. All necessary patches are applied. Updating the Dockerfile to use other source repos is possible.
Once the command sudo docker run -it --rm tace /bin/bash
is executed, the environment becomes ready for use.
symex2.c
has been designed for demonstration purposes, closely mirroring real-world projects with its numerous if-then-else branches that make achieving complete code coverage a challenging task.
Example source code for symex2.c
:
//symex2.c
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[])
{
char *file = argv[1];
char str[2000];//tainted buffer
int fsize;
int check=0, i;
FILE *fp = fopen(file, "r");
fseek(fp, 0, SEEK_END); // seek to end of file
fsize = ftell(fp); // get current file pointer
fseek(fp, 0, SEEK_SET); // seek back to beginning of file
if(fsize<40 || fsize>1999)
return -1;
fread(str, sizeof(char), fsize, (FILE*)fp);
for(i=0;i<(fsize-6);i++)
{
if((str[i]+str[i+1])<(str[i+3]+str[i+4]+str[i+5]))
check++;
}
// push constraint from loop
if(check<1)
{
printf("too many large chars. Exiting!\n");
return -1;
}
if(str[7] + str[5] == 'R')
{
printf("branch 1\n");
if(str[18] + str[19] == 'b')
{
//some code
printf("branch 2\n");
if(str[2] + str[4] == 'X')
{
printf("unrelated branch 1\n");
}
if(str[4] + str[8] == 'X')
{
printf("unrelated branch 2\n");
}
if(str[15] + str[18] == 'U')
{
printf("branch 3\n");
if(str[5] + str[9] == 'X')
{
printf("unrelated branch 3\n");
}
if(str[11] + str[24] == 'X')
{
printf("unrelated branch 4\n");
}
if(str[15] + str[14] == 'g')
{
printf("vulnerable branch 4\n");
//vulnerable_function(&str[30]);
abort();
}
else
printf("branch 5\n");
}
else
printf("branch 6\n");
}
}
printf("branch 0\n");
return 0;
}
After running Docker, the WORKDIR is established as the /tmp
folder. Here, you will find the previously mentioned source code. The backend typically manages input from STDIN by default. However, to designate a specific file as symbolic, you can set the SYMCC_INPUT_FILE
variable. For changing where output files are saved, use the SYMCC_OUTPUT_DIR
variable
To run a simple case (where the workdir is /tmp
), use the following command:
cc -O2 symex2.c -o test.elf
echo 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA' > input
SYMCC_INPUT_FILE=/tmp/input SYMCC_OUTPUT_FOLDER=/tmp/output taceqemu /tmp/test.elf /tmp/input
Example of output:
FOR TREE 0x55e94ea2a040 :: SYMDEP [ ], CONCRETE DEP [ ]
[STAT] SMT: {"solving_time_elapsed": 1242 }
[INFO] New testcase: /tmp/output/000000
[Solver::addJcc] isInteresting=true
FOR TREE 0x55e94ea01650 :: SYMDEP [ 1 4 5 ], CONCRETE DEP [ 0 3 ]
FOR TREE 0x55e94ea2a040 :: SYMDEP [ 1 4 5 ], CONCRETE DEP [ 0 3 ]
FOR TREE 0x55e94eb63420 :: SYMDEP [ 1 4 5 ], CONCRETE DEP [ 0 3 ]
[STAT] SMT: {"solving_time_elapsed": 1429 }
[INFO] New testcase: /tmp/output/000001
[Solver::addJcc] isInteresting=true
FOR TREE 0x55e94ea01650 :: SYMDEP [ 2 3 5 6 ], CONCRETE DEP [ 0 1 4 ]
FOR TREE 0x55e94eb29150 :: SYMDEP [ 2 3 5 6 ], CONCRETE DEP [ 0 1 4 ]
[STAT] SMT: {"solving_time_elapsed": 1686 }
[INFO] New testcase: /tmp/output/000002
[Solver::addJcc] isInteresting=true
FOR TREE 0x55e94ea01650 :: SYMDEP [ 3 4 6 7 ], CONCRETE DEP [ 0 1 2 5 ]
FOR TREE 0x55e94eb26290 :: SYMDEP [ 3 4 6 7 ], CONCRETE DEP [ 0 1 2 5 ]
[STAT] SMT: {"solving_time_elapsed": 1905 }
[INFO] New testcase: /tmp/output/000003
[Solver::addJcc] isInteresting=false
Printed log demonstrates state of the current tree, location of the test case, solving time, and considered dependencies.
Build LibCap with TACE.
mkdir /tace_build
cd /tace_build
git clone https://github.com/the-tcpdump-group/libpcap.git
cd libpcap && ./autogen.sh
CC=/symcc_build/tace ./configure
make && make install
Build TCPDump with TACE.
git clone https://github.com/the-tcpdump-group/tcpdump.git
cd tcpdump &&./autogen.sh
CC=/symcc_build/tace ./configure
make && make install
Similarly, build LibCap and TCPDump with AFL-clang.
mkdir /afl_build
cd /afl_build
git clone https://github.com/the-tcpdump-group/libpcap.git
git clone https://github.com/the-tcpdump-group/tcpdump.git
export AFL_USE_ASAN=1
cd libpcap
./autogen.sh
CC=/afl/afl-clang ./configure
make && make install
cd /afl_build/tcpdump/
./autogen.sh
CC=/afl/afl-clang ./configure
make && make install
Run the Fuzz campaign with TACE.
mkdir /corpus && mkdir /fuzz_res && mkdir /fuzz_res/afl_out/
echo "AAAAAAAA" > /corpus/seed
/afl/afl-fuzz -M afl-master -i /corpus/ -o /fuzz_res/afl_out/ -m none -- /afl_build/tcpdump/tcpdump -e -r @@
/afl/afl-fuzz -S afl-secondary -i corpus/ -o /fuzz_res/afl_out/ -m none -- /afl_build/tcpdump/tcpdump -e -r @@
~/.cargo/bin/symcc_fuzzing_helper -o /fuzz_res/afl_out/ -a afl-secondary -n tace -- /tace_build/tcpdump/tcpdump -e -r @@
A heap buffer overflow vulnerability exists in the DumpScreen2RGB function within the gif2rgb.c component of GifLib 5.2.1 specifically between lines 321 and 323. The flaw can be exploited when handling a specially crafted GIF during the image-saving process. It is important to note that this issue is distinct from CVE-2022-28506. While the [5b74cd] commit effectively addresses CVE-2022-28506, it does not provide a resolution for this particular heap-buffer overflow problem.
The crash reproduction files and data is available in folder TACE/tace/giflib-crash/giflib521/
.
The heap buffer overflow detected by TACE is easily replicable using the provided Docker image.
To replicate the CVE-2023-48161 vulnerability, execute the POC_crash_docker
from the TACE/tace/giflib-crash/giflib521/
directory.
STEP 1 (Build the Docker image):
sudo docker build --rm -t giflib-vuln -f POC_crash_docker .
STEP 2 (Running the container):
sudo docker run -it --rm giflib-vuln /bin/bash
STEP 3 (Running rhe PoC bash script):
./poc.sh
Address sanitizer output (Heap-based Overflow)
-
crashes.zip ==> contains the sample which can trigger the crash.
-
poc.sh ==> contains a bash script that executes the vulnerable gif2rgb program with the input that triggers the crash.
-
POC_crash_docker ==> dockerfile to create the environment and setup giflib 5.2.1 for easy bug replication.
Environment: Ubuntu 22.04 LTS
wget https://yer.dl.sourceforge.net/project/giflib/giflib-5.2.1.tar.gz
tar -xf giflib-5.2.1.tar.gz
cd giflib-5.2.1
make CFLAGS="-std=gnu99 -fPIC -Wall -Wno-format-truncation -ggdb -fsanitize=address,undefined,leak,integer -fno-omit-frame-pointer"
Replicate the crash: ./giflib-5.2.1/gif2rgb -o out /tmp/crashes/sample
Extends the QEMU,SYMCC,SYMQEMU,SQYM , and our contributions to previously existing files adopt those files' respective licenses; the files that we have added are made available under the terms of the GNU General Public License as published by the Free Software Foundation, either version 2 of the License, or (at your option) any later version.